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Abstract 

Gaussian random fields (GRFs) are important building blocks in hierarchical 
models for spatial data, but their parameters typically cannot be consistently esti¬ 
mated under in-fill asymptotics. Even for stationary Matern GRFs, the posteriors for 
range and marginal variance do not contract and for non-stationary models there is 
a high risk of overfitting the data. Despite this, there has been no practically useful, 
principled approach for selecting the prior on their parameters, and the prior typi¬ 
cally must be chosen in an ad-hoc manner. We propose to construct priors such that 
simpler models are preferred, i.e. shrinking stationary GRFs towards infinite range 
and no effect, and shrinking non-stationary extensions towards stationary GRFs. 

We use the recent Penalised Gomplexity prior framework to construct a practi¬ 
cally useful, tunable, weakly informative joint prior on the range and the marginal 
variance for a Matern GRF with fixed smoothness. We then introduce extra flexi¬ 
bility in the covariance structure of the GRF through covariates and discuss how to 
extend the stationary prior. We apply the priors to a dataset of annual precipita¬ 
tion in southern Norway and show that the scheme for selecting the hyperparameters 
of the non-stationary extension leads to improved predictive performance over the 
stationary model. 
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(a) p = cr^ = 1 (b) p = o^ = 1000 (c) p = o^ = 1000000 

Figure 1: Simulations with the exponential covariance function c{d) = for dif¬ 

ferent values oi p = using the same underlying realization of independent standard 
Gaussian random variables. The patterns of the values are almost the same, but the 
levels differ. 


1 Introduction 


Gaussian random fields (GRFs) provide a simple and powerful tool for introducing spa¬ 
tial or temporal dependence in a model and are fundamental building blocks in spatial 
statistics and non-parametric modelling, but even for stationary GRFs controlled only 
by range and marginal variance, the choice of prior distribution remains a challenge. The 
prior is difficult to choose: a well-chosen prior will stabilise the inference and improve the 
predictive performance, whereas a poorly chosen prior can be catastrophic. But since the 
parameters in the second-order structure of a GRF can affect the predictive distributions 
of the GRF in unexpected ways, it is difficult to construct good priors and the prior is 
typically chosen in an ad-hoc fashion. 

We focus on the case where the stationary part of the GRF is a Mat&n GRF with 
fixed smoothness, but the methods we develop are more widely applicable. The Matern 
GRF has a ridge in the likelihood along which the value of the likelihood decreases 


slowly (Warnes and Ripley 19871, and there is no consistent estimator under in-fill 
asymptotics for the range and the marginal variance (Stein, 1999 Zhang 2004). The 


behaviour of the prior on the ridge will strongly affect the behaviour of the posterior 
on the ridge and no matter how many points are observed in a bounded observation 
window, there is a limit to the amount of information that can be learned about these 
parameters. For example, if a one-dimensional GRF with an exponential covariance 
function is observed on the interval [0,1], it is only the ratio of the range and the marginal 
variance that can be estimated consistently, and not the range and the marginal variance 
separately (Ying, 1991| . When the range and marginal variance increase together, the 
marginal variance at each location changes, but conditional on the value at one location 
the remaining correlations between the locations change only slightly. Figure shows 
how the level moves, but the pattern of the points around the level remains stable for 
increasing values of the range and the marginal variance. 
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Since there is a connection between what can be learned from data and and what can 
affect the predictive distributions, the ratio of the range and the marginal variance is also 
the important quantity for the asymptotic predictions under in-hll when the exponential 
covariance function is used (Stein 19991. But even though intrinsic models can be useful, 
a practitioner who observes the values in Figure la is unlikely to believe that the ranges 
and marginal variances that can generate Figures [TE] and [T^ are correct even if the spread 
is consistent with the observed pattern. Therefore, we believe the practitioner should be 
provided with a prior that allows him/her to include expert knowledge, in an interpretable 
way, about how far the posterior should be allowed to move along the ridge. 

But to our knowledge, the only principled approach to prior selection for GRFs was 
introduced by Berger et al. (20011, who derived reference priors for a GRF partially 
observed with no noise. These priors fundamentally depend on the design of the exper¬ 
iment, which makes them inappropriate as “blind” default priors or when data is being 
analysed in a sequential fashion, and they have no option to include expert knowledge 
about the typical size of the spatial effect and are thus not apropriate for achieving 
credible intervals that are consistent with prior knowledge. 

The work on reference priors have been extended by several authors (]Paulo 2005 


Kazianka and Pilz 2012 Kazianka 20131 - critically Oliveira (20071 allowed for Gaussian 


observation noise - however, these papers have the same design dependency as the original 
work. Perhaps most importantly, the priors are not applicable for many hierarchical 
models because the assumption of Gaussian observation noise and linear covariates is 
insufficient in many applications and there are currently no extensions of spatial reference 
priors to other observation processes. In the more restricted case of a GRF with a 
Gaussian covariance function van der Vaart and van Zanten (20091 showed that the 


inference asymptotically behaves well with an inverse gamma distribution on range, but 
they provide no guidance on which hyperparameters should be selected for the prior. 

In practice, the range is commonly given a uniform distribution on a bounded inter¬ 
val, a log-uniform distribution on a bounded interval or an inverse gamma distribution 
with inhnite variance and the mean placed at an appropriate location. These priors have 
little theoretical foundation and are ad-hoc choices based on the idea that they will allow 
reasonable ranges, but Berger et al. (20011 noted that the posterior inference can be 
sensitive to the choice of cut-off for the uniform prior, and it is necessary with careful 
sensitivity analysis. The bounded intervals are necessary because improper prior distri¬ 
butions cannot be applied for the parameters of a GRF without great care as they tend 
to lead to improper posterior distributions. From a Bayesian modelling perspective this 
is an unsatisfactory situation because the prior is supposed to encode the user’s prior 
knowledge and uncertainty about the parameters, and not be an ad-hoc choice made out 
of convenience without theoretical justihcation. 

We propose to use the recent Penalised Gomplexity (PG) prior framework developed 
by Simpson et al. (20151 to construct a new, principled joint prior for the range and the 
marginal variance of a Matern GRF. The PG prior framework ignores the observation 
process entirely and focuses instead on the geometry of the parameter space induced by 
the inhnite-dimensional GRF. This is more technically demanding than considering hnite- 
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dimensional observations, like for the reference priors, but we are able to use the resulting 
prior for any spatial design and any observation process. The second key difference 
between the reference priors and the PC prior approach is that while the former is non- 
informative in a technical sense, PC priors are weakly informative and, therefore, include 
specific information from the user. In particular, PC priors need a point in the parameter 
space, considered a base model, and hyperparameters that indicates how strongly the user 
wishes to shrink towards the base model. Simpson et al. (20151 showed that the resulting 
inference was quite robust against the specification of the hyperparameters. 

This provides a way to limit how far the stationary GRF can move towards near- 
intrinsic models on the ridge, but usually one tends to believe that there also should 
be some degree of non-stationarity in the covariance structure of the GRF. However, 
adding this extra flexibility can lead to overfitting and one must be careful in selecting 
the prior on the extra flexibility. Since the covariance structure of a GRF is only observed 
indirectly through the values of the process and since there is no information about the 
covariances for locations without observations, the estimated covariance structure can 
be highly influenced by the types of non-stationarities allowed and the prior used. We 
extend the PC prior developed for the stationary Matern GRF to a prior for a non¬ 
stationary GRF where the non-stationarity is controlled by covariates. The prior is 
motivated by ^f-priors and shrinkage properties, and we consider one scheme for selecting 
the hyperparameters that reduces the risk of overfitting the non-stationary GRF. 

We start by deriving the new, joint PC prior for the range and the marginal variance 
of a Matern GRF with fixed smoothness parameter in Section Then in Section the 
frequentist coverage is studied through a simulation study and compared with the cov¬ 
erage obtained using the Jeffreys’ rule prior and ad-hoc uniform and log-uniform priors. 
In Section we study the behaviour of the joint posterior under the PC-prior and the 
Jeffreys’ rule prior and discuss the difference in behaviour. Then the frequentist proper¬ 
ties of spatial logistic regression are studied in Section]^ to demonstrate the applicability 
of the PC prior for a non-Gaussian observation process. In Section we discuss how 
to extend the prior for the stationary model to a prior for a non-stationary GRF and 
consider an application to annual precipitation in southern Norway. The paper ends with 
discussion and concluding remarks in Section 


2 Penalised complexity prior for the Matern GRF 

2.1 Background 

The principle idea of the PC-prior framework is to think of each model component as a 
flexible extension of a simpler, less flexible base model. For example, a random effect with 
non-zero variance is considered an extension of a random effect with zero variance, i.e. no 
random effect. After choosing the base model for the component, one derives a distance 
measure from the base model to the models described by other parameter values. This 
distance quantifies how much more flexible the model is for other parameter values than 
the base model, and provides a measure of complexity for the model component. The 
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prior is then set directly on the distance from the base model instead of on the parameters 
of the model. This provides an improved way of setting priors on parameters for which 
it is hard to have intuition. For example, correlation parameters close the border values 
— 1,0 and 1, or the range in spatial models. 

We measure the extra complexity of each model compared to the base model through 
the Kullback-Leibler divergence (KLD). The KLD of the probability density / from the 
probability density g is defined by 


DKM\g) = j /(®)log^^|^^ d®. 


and expresses the information lost when g is used to approximate /. The asymmetry of 
the KLD fits well with the asymmetric idea of preferring the base model, and we turn 
the KLD into a uni-directional distance from the base model g to the model / through 


d( /||g) = V2KLD (/||g). 

[Simpson et al. (20151 provide three principles for setting the prior on the distance: 
Occam’s razor, constant rate penalisation and user-defined scaling. Occam’s razor is 
achieved by constructing a prior that penalises deviations from the base model and 
favours the base model until the data provides evidence against it. This suggests that 
the prior density should have its peak at distance 0 and decrease with increasing distance. 
The constant rate penalisation is achieved by making the prior on the distance, d, satisfy 
the relationship 

7r{d + 6) _ s 


Tr{d) 


= r 


d,6 > 0, 


for a constant decay-rate 0 < r < 1. For this choice, the relative change in the prior 
when the distance increases by 5 does not depend on the current distance d, and the 
result is the exponential distribution vr(d) = Aexp(—Ad). 

The hyperparameter A is set based on the final principle of user-defined scaling. We 
transform the distance back to an interpretable size Q{d) and include prior information 
through, for example, P{Q{d) > U) = a oi P{Q{d) < L) = a, where t/ or L is an upper 
or lower limit, respectively, and a is the probability in the upper or lower tail of the 
prior distribution. By selecting U or L, and a the user combines prior belief with a prior 
derived from the geometry of the parameter space. 

We want to extend the approach outlined above to Gaussian Matern fields with fixed 
smoothnesses. We consider GRFs with the covariance function 


C{d) = 


1 

r(u)2'^-i 



where the spatial range p and the marginal variance cr^ must be estimated, and the 
smoothness u is considered fixed. However, in general, the KLD between two arbitrary 
choices of parameters (/Joj^o) infinite when the GRF is observed at all 

locations in a bounded observation window. And to facilitate the construction of the 
prior, we select a parametrization of the spatial field that more accurately describes 
what can be and what cannot be consistently estimated under in-fill asymptotics. 
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We introduce the parameters k = y/^/p and 


T = 




(47r)c*/2r(j/ + ’ 


which are a slight re-parametrization of the SPDE in Lindgren et al. (20111, 


— A)“/^(v^u(s)) = >V(s), s e 


( 1 ) 


where A = is the Laplacian and W is standard Gaussian white noise. If k 

and r are chosen as above, the SPDE specifies a Matern GRF with range p, marginal 
variance cr^ and smoothness u = a — d/2. 

In this parametrization r can be consistently estimated under in-fill asymptotics, 
whereas k cannot. If k is kept fixed and the value of r changes, the KLD is infinite, 
but if the value of r is kept fixed and the value of k changes, the KLD is finite. We 
use a two-step procedure for constructing the joint prior where we first set a prior for 
K through the KLD of the GRF and then set a prior on r given the value of k through 
a consideration of finite-dimensional observations. When constructing the prior for k, 
we use the base model k = 0, which corresponds to an intrinsic GRF similar to a spline 
model, and when constructing the prior for t\k, we use the base model r = oo for the 
fixed value of k. This leads to a combination of shrinkage towards infinite range and zero 
marginal variance. 


2.2 Joint prior for range and marginal variance 

We fix r and calculate the distance function for k using the intrinsic base model k = 0. 
The technical derivation is given in Appendix and results in a distance function 

d{K) oc (2) 


that provides a quantification of how far the distribution of the GRF is from the intrinsic 
GRF as a function of k. Since p = V^/k, Equation (|^ implies that the distance behaves 
as if the marginal variance increases in such a way that r is kept constant. 

The proportionality constant in the distance expression is not important and we set 
it to 1 and set an exponential prior on the distance, which leads to a prior on k given by 


7r(K) = Al exp(—Aid(K)) 




= ^ exp(—A ik'^'^^), k > 0, (3) 


where Ai can be selected by controlling the a priori probability that the range is below 
a specific limit, P{y/^/K < po) = ai. After selecting the lower range, pQ, and the 
probability in the lower tail, ai, the expression Ai = — log(Q!i)(po/\/^)^^^ can be used 
to calculate the associated hyper parameter. 

If the process were observed at all locations on a bounded observation window, the 
value of T would be exactly determined. A prior for r is needed because in practice we 
only observe a finite-dimensional part of the GRF. We make the assumption that we 
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are interested in observing finite-dimensional quantities that arise by applying a linear 
operator on the spatial field, i.e. 7r{u\K,T) oc exp(—The covariance matrix 
S only depends on k and not r, and the parameter r acts as the precision parameter. 
We use the PC prior for the precision parameter in a multivariate Gaussian constructed 
by Simpson et al. (2015[) (with base model r = oo), 


'^'Pc{'r) = ^T ^'^^exp(—A 2 T T > 0, 


(4) 


where we can set A 2 by controlling the a priori probability that the marginal variance 
exceeds a specific level, 

,2 


P I cr — 2i/ > ^0 


TK 


K = 02- 


(5) 


The constant 


C{l2) = 


T{u) 


r(i/ + d/2)(47r)'^/2 

is the constant needed to make the left-hand side of the inequality equal to the marginal 
variance of the GRF. 

Since the criterion in Equation ([^ is conditional on the value of k, it introduces 
dependence between k and r in the joint prior. We write Equation ([^ as 

C{u) 


P r < 


2iy„2 




K = 02 


and find 


exp -A 2 


c{y) 

2v„2 


a. 


-1/2N 


— 02, 




where A 3 absorbs the other constants in A 2 . We insert this into Equation (Q and find 
the conditional distribution 


-k{t\k) = exp(-A 3 K ‘'t 


2k^ 


( 6 ) 


This implies that the dependence between k and r is affected by the value of the smooth¬ 
ness ly. 

The joint prior on k and r is found by combining Equation ^ and Equation ([^, 
and is given by 


TT 


(k,t) = Tt{k)tt{t\k) = 3/2^d/2 1 1/2^^ 


There is a one-to-one correspondence between k and r, and p and cr^. 


p 


K. 



C{u) 
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which can be exploited to transform the joint prior for k and r to the joint prior for p 
and 

Main result 2.1. Joint prior for the range and marginal variance of a Matern GRF 


7t{p, 


i-d/2 

2 ^ 


exp 




exp (-Asa) , 


where A 4 and A 5 are selected according to the a priori statements 

P (/9 < po) = ua and > fig) = as, 

which give A 4 = —Pg'^^ log(a 4 ) and and As = — 


(7) 


3 Frequent ist coverage 


The series of papers on reference priors for GRFs starting with Berger et al. (20011 eval 


uated the priors by studying frequentist properties of the resulting Bayesian inference. 
If a prior is intended as a default prior, it should lead to good frequentist properties 
such as a frequentist coverage of the equal-tailed 100(1 — a)% Bayesian credible intervals 
that is close to the nominal 100(1 — a)%. The PC prior is not an objective prior and 
it is not intented to be non-informative, but the sensitivity of the posterior to the prior 
specification is important. We follow their design with one key difference: we do not 
include covariates and measurement noise. The reference priors are not proper distribu¬ 
tions and the goal of the series of papers was to derive them for different situations such 
as a spatial field combined with covariates, and a spatial field combined with covariates 
and Gaussian measurement noise. However, in this paper we are constructing a prior 
for the GRF component itself and we are not constructing a prior for the GRF together 
with covariates or together with covariates and Gaussian measurement noise. This is 
possible because the PC-prior is a proper distribution and can be applied to a spatial 
field together with covariates and arbitrary observation processes without breaking the 
properness of the posterior. 

The study uses an isotropic GRF, u, with an exponential covariance function c{d) = 
exp(—2d//3g) observed at the locations shown in Figure]^ The observation locations were 
randomly selected within the domain [ 0 , 1 ]^ and are distributed in an irregular pattern. 
The study is performed for two values of the nominal range: a short range, po = 0.1, 
and a long range, po = 1. We generate multiple realizations and for each realization we 
assume that the field is observed directly and fit the model 


yi = u{si), i = 1,2, ...,25, 


where n is a GRF with an exponential covariance function with unknown parameters p 
and (T^. We apply four different priors: the PC-prior (PriorPC), the Jeffreys’ rule prior 
(PriorJe), a uniform prior on range on a bounded interval combined with the Jeffreys’ 
prior for variance (PriorUnl) and a uniform prior on the log-range on a bounded interval 
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Figure 2: Spatial design for the simulation study of frequentist coverage. 


Table 1: The four different priors used in the study of frequentist coverage. The Jeffreys’ 
rule prior uses the spatial design of the problem through U = where S is the 

correlation matrix of the observations (See Berger et al. (20011). 


Prior Expression Parameters 

p, cr > 0 

PriorPC 7ri(p, a) = AiA 2 p“^exp (—Aip“^ — A 2 cr) Hyperparameters: 

PO) Ocr, 


PriorJe 


/ 1 \ 1/2 
7r2(p,cr) = CJ ^ (tr(C/^) - -tr(f7)^J 


p, cr > 0 

Hyperparameters: 

None 


p G [H, H], cr > 0 

PriorUnl 7r3(p, a) oc Hyperparameters: 

A, B 

p G [H, H], cr > 0 

PriorUn2 7r4(p, a) oc ■ p~^ Hyperparameters: 

A, B 
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combined with the Jeffreys’ prior for variance (PriorUn2). The full expressions for the 
priors are given in Table 

For each choice of prior and hyperparameters we generate 1000 observation vectors 
y = (yi)y 2 ) • • • 12 / 25 ) and estimate the equal-tailed 95% credible intervals for range and 
marginal variance for each observation by running an MCMC-chain. The number of times 
the true value is contained within the estimated credible interval is divided by 1000 and 
given as the estimate of the frequentist coverage. PriorJe has no hyperparameters, but 
PriorPC, PriorUnl and PriorUn2 each has hyperparameters that need to be set before 
using the prior. For PriorUnl and PriorUn2 it is hard to give guidelines about which 
values should be selected since the main purpose of limiting the prior distributions to 
a bounded interval is to avoid an improper posterior and the choice tends to be ad- 
hoc. For PriorPC, on the other hand, there is a intepretable expression to choose the 
hyperparameters, which helps give an idea about which prior assumptions the chosen 
hyperparameters are expressing. 

For PriorPC we need to make an a priori decision about the scales of the range and 
the marginal variance. The prior is set through four hyperparameters that describe our 
prior beliefs about the spatial field. We use 

P(/9 < po) = 0.05 

for Po = 0.025/?T) Po = O.lpxi Po = 0.4pT and po = l.OpT, where px is the true range. 
This covers a prior where po is much smaller than the true range, two priors where po is 
smaller than the true range, but not far away, and one prior where po is higher than the 
true range. For the marginal variance we use 

P{cr^ > a^) = 0.05, 

for (To = 0.625, (Tq = 2.5, (Tq = 10 and ao = 40. We follow the same logic as for range and 
cover too small and too large ao and two reasonable values. For PriorUnl and PriorUn2, 
we set the lower and upper limits for the nominal range according to the values A = 0.05, 
A = 0.005 and A = 0.0005, and B = 2, B = 20 and B = 200. Some of the values are 
intentionally extreme to see the effect of misspecification. 

The results for PriorPC are given in Tables and for the true ranges px = 0.1 and 
px = 1, respectively, and the tables for PriorUnl and PriorUn2 are given in Appendix |B| 
PriorJe resulted in 97.0% coverage with average length of the credible intervals of 0.86 for 
range and 96.0% coverage and average length of the credible intervals of 2.7 for marginal 
variance for po = 0.1, and 95.4% coverage with average length of the credible intervals 
of 445 for range and 94.4% coverage with average length of the credible intervals of 355 
for variance for po = 1. The results show that for PriorPC, PriorUnl and PriorUn2 
the coverage and the length of the credible intervals are dependent on the choice of 
hyperparameters. This is not surprising since there are few observation and there is a 
ridge in the likelihood where the posterior behaviour is strongly dependent on the the 
prior. The lengths of the credible intervals are, in general, more well-behaved for po = 0.1 
than for po = 1 because there is more information about the range available in the domain 
when the range is shorter. 
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Table 2: Frequentist coverage of 95% credible intervals for range and marginal variance 
when the range /jq = 0.1 using PriorPC, where the average lengths of the credible intervals 
are shown in brackets. 


(a) Range 


Po\o'0 

40 

10 

2.5 

0.625 

0.025 

0.768 [0.25] 

0.749 [0.24] 

0.760 [0.20] 

0.693 [0.17] 

0.1 

0.965 [0.35] 

0.976 [0.29] 

0.961 [0.27] 

0.937 [0.21] 

0.4 

0.990 [0.45] 

0.989 [0.41] 

0.993 [0.33] 

0.987 [0.25] 

1.6 

0.717 [0.98] 

0.692 [0.82] 

0.756 [0.54] 

0.807 [0.34] 



(b) Marginal variance 


/3o\o'0 

40 

10 

2.5 

0.625 

0.025 

0.941 [1.5] 

0.952 [1.4] 

0.957 [1.3] 

0.918 [0.97] 

0.1 

0.953 [1.6] 

0.966 [1.5] 

0.944 [1.4] 

0.927 [0.98] 

0.4 

0.953 [2.0] 

0.952 [1.8] 

0.960 [1.5] 

0.943 [1.1] 

1.6 

0.904 [3.9] 

0.906 [3.2] 

0.939 [2.2] 

0.972 [1.3] 


Table 3: Frequentist coverage of 95% credible intervals for range and marginal variance 
when the true range pQ = 1 using PriorPC, where the average lengths of the credible 
intervals are shown in brackets. 


(a) Range 


Po\o '0 

40 

10 

2.5 


0.625 

0.025 

0.950 [12] 

0.945 [7.1] 

0.906 

[3.2] 

0.821 [1.4] 

0.1 

0.977 [15] 

0.966 [8.2] 

0.962 

[3.6] 

0.866 [1.5] 

0.4 

0.965 [26] 

0.981 [13] 

0.992 

[5.1] 

0.988 [1.8] 

1.6 

0.159 [74] 

0.349 [31] 

0.700 

[11] 

0.954 [3.3] 



(b) Marginal variance 



Po\o'0 

40 

10 

2.5 


0.625 

0.025 

0.944 [11] 

0.956 [6.2] 

0.933 

[2.8] 

0.797 [1.1] 

0.1 

0.957 [13] 

0.966 [7.2] 

0.954 

[3.1] 

0.865 [1.2] 

0.4 

0.943 [23] 

0.957 [11] 

0.987 

[4.4] 

0.972 [1.5] 

1.6 

0.441 [68] 

0.534 [29] 

0.797 

[9.1] 

0.984 [2.5] 
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For PriorUnl the coverage and the length of the credible intervals are strongly depen¬ 
dent on the upper limit in the prior. The prior has the undesirable property of including 
stronger and stronger prior belief in high ranges when the upper limit is increased. In 
practice the upper limit is unlikely to be chosen as extreme as in the example, but it 
verifies the observation of Berger et al. (20011 that the inference is sensitive to the hy¬ 
perparameters for this prior. For PriorUn2 the coverage is good in both the short range 
and long range case, but the lengths of the credible intervals are sensitive to the upper 
limit of the prior. 

The results show that the coverage for PriorPC is stable when a too low lower limit 
for range or a too high upper limit for variance is specified, but that specifying a too high 
lower limit for the range or a too low upper limit for variance produces large changes 
in the coverage. This is not unreasonable as the prior is then explicitly stating that the 
true value for range or variance is unlikely. The average length of the credible intervals 
are more sensitive to the hyperparameters than the coverages, but we see less extreme 
sizes for the credible intervals than for PriorJe. 

The coverage of PriorJe is good, but the credible intervals seem excessively long 
and the prior is more computationally expensive than the other priors. PriorJe is only 
computationally feasible for low numbers of points since there is a cubic increase in 
the complexity as a function of the number of observations. The average length of the 
credible intervals for pQ = 1 for marginal variance is 355, which imply unreasonably 
high standard deviations. The high standard deviations do not seem consistent with an 
observation with values contained between —3 and 3. We study the credible intervals for 
PriorPC and PriorJe closer for a specific realization in the next section to gain intuition 
about why this happen. 

With respect to computation time and ease of use versus coverage and length of 
credible intervals PriorUn2 and PriorPC appear to be the best choices. If coverage is the 
only concern, PriorUn2 performs the best, but if one also wants to control the length 
of the credible intervals by disallowing unreasonably high variances, PriorPC offers the 
most interpretable alternative. In a realistic situation it is probable that the researcher 
has prior knowledge, for example, that the spatial effect should not be greater than, say 
4, and by encoding this information in PriorPC we can limit the upper limits of the 
credible intervals both for range and marginal variance. 


4 Behaviour of the joint posterior 

The extreme length of the credible intervals seen for the Jeffreys’ rule prior are not entirely 
surprising due to the ridge in the likelihood, but they are troubling for interpreting 
the parameters. In the previous section we only looked at properties of the marginal 
credible intervals, but these do not tell the entire story because there is strong dependence 
between range and marginal variance in the joint posterior distribution. We study this 
dependence by studying the posterior distribution for the realization shown in Figure 
The true range used to simulate the realization is 1. We use an MCMC sampler to draw 
samples from the joint posterior using the PC-prior with parameters ap = 0.05, po = O-lj 
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Figure 3: One realization of a GRF with the covariance function c{d) = exp{—2d) at 25 
selected locations. 


cto- = 0.05 and ag = 10, and we draw samples from the joint posterior using the Jeffreys’ 
rule prior. 

Figure 1^ shows that the upper tails of the posteriors when using the Jeffreys’ rule 
prior are heavier than the upper tails of the posteriors when using the PC-prior. The 
lower endpoints of the credible intervals are similar for both priors, but there is a large 
difference in the upper limits because the likelihood decays slowly along the ridge. The 
marginal posterior distributions do not show the full story about the inference on range 
and marginal variance because the two parameters are strongly dependent in the posterior 
distribution. The PC-prior for the range has a heavy upper tail and the upper tail of 
the posterior for the range is controlled through the prior on the marginal variances. 
The large difference in the marginal posteriors for the nominal range in Figure [4a] can be 
explained by the behaviour of the joint posterior. 

Figure]^ shows the strong posterior dependence between nominal range and standard 
deviation in the tail of the distribution. Stein (19991 showed that the ratio of range and 
marginal variance is the important quantity for asymptotic predictions with the expo¬ 
nential covariance function. So the long tails are not a major concern for predictions, but 
they pose a concern for the interpretability of range and marginal variance. The values 
of all the observations in Figure]^ he within the range —1 to 3 and it is unlikely that the 
true standard deviation should be on the order of 20. Intrinsic models have a place in 
statistics, but the results show that the Jeffreys’ rule prior has the, potentially, undesir¬ 
able behaviour of favouring intrinsic CRFs with large marginal standard deviations and 
ranges even though they might not be physically reasonable. The PC prior offers a way 
to introduce prior belief about the size of the marginal standard deviations, and thus a 
way to reduce the preferance for the intrinsic CRFs. 
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Log(Range) 

(a) Posterior for the logarithm of range 


Log(Standard deviation) 

(b) Posterior for the logarithm of marginal standard 
deviation 


Figure 4: Marginal posteriors of the logarithm of range and the logarithm of marginal 
standard deviation. The dashed lines shows the posterior and the credible intervals when 
the PC-prior is used and the solid line shows the posterior and the credible intervals when 
the Jeffreys’ rule prior is used. 



Figure 5: Samples from the joint posterior of range and marginal standard deviation. 
The red circles are samples using the PC-prior and the black circles are samples using 
the Jeffreys’ rule prior. 
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5 Example: Spatial logistic regression 


What makes the PC prior more practically useful than the reference prior, beyond the 
computational benefits and the interpretability, is that the prior is applicable in any hier¬ 
archical model and does not have to be re-derived each time a component is removed or 
added, or the observation process is changed. We consider a simple spatial logistic regres¬ 
sion example to demonstrate the applicability of the PC prior beyond direct observations 
or Gaussian measurement noise. 

We select the 25 locations in Figure]^ and generate realizations from the model 
yi\pi ~ Binomial(20,pi), z = 1, 2,..., 25, 

where 

probit(pi) = u{si), 

where n is a GRF with the exponential covariance function with parameters p = 0.1 and 
cr = 1. For each realization the parameters p and cr^ are assumed unknown and must be 
estimated. The posteriors of the parameters are estimated with an MCMC chain and the 
equal-tailed 95% credible intervals are estimated from the samples of the MCMC-chain 
after burn-in. We repeat the procedure above 500 times and report the number of times 
the true value is contained in the credible interval and the average length of the credible 
interval. 

The experiment is repeated for 64 different settings of the prior: the hyperparameter 
po varies over po = 0.0025,0.01,0.04,0.16 and the hyperparameter (Tq varies over ag = 
40,10, 2.5,0.625. This covers a broad range of values from too small to too large. The 
values in Table are similar to the values in Table [^except that the credible intervals are 
slightly longer. The longer credible intervals are reasonable since the binomial likelihood 
gives less information about the spatial field than direct observation of the spatial field. 
The coverage for marginal variance is good even for grossly miscalibrated priors, but the 
coverage for range is sensitive to bad calibration for range and the coverage is somewhat 
higher than nominal for the well-calibrated priors. This is a feature also seen in the 
directly observed case in Section 

6 Prior for extra flexibility in the covariance structure 

Neither stationary nor non-stationary CRTs provide true representations of reality, but 
the extra flexibility in the covariance structure of a non-stationary GRF may provide a 
better fit to the data than the less flexible covariance structure of a stationary GRF. We 
will not be working under the stationary/non-stationary dichotomy, but instead under 
the more pragmatic viewpoint that all datasets are non-stationary to some degree. As 
long as the parametrization of the non-stationary GRF contains the stationary GRF as a 
special case, the added flexbility should improve the predictions if the prior is conservative 
enough to prevent the model from overfitting. 

expanded the parameters of the SPDE in Equation Q 
into spatially varying functions, log(fi:(-)) and log(r(-)), through low-dimensional bases 


Ingebrigtsen et al. (2014 
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Table 4: Frequentist coverage of the 95% credible intervals for range and marginal vari¬ 
ance when the true range is 0.1 and true marginal variance is 1, where the average length 
of the credible intervals are given in brackets, for the spatial logistic regression example. 

(a) Range 


Po\o'0 

40 


10 

2.5 


0.625 


0.025 

0.804 

[0.29] 

0.790 [0.24] 

0.774 

[0.22] 

0.726 

[0.19] 

0.1 

0.974 

[0.41] 

0.986 [0.37] 

0.974 

[0.33] 

0.956 

[0.24] 

0.4 

0.996 

[0.61] 

0.982 [0.57] 

0.996 

[0.43] 

0.992 

[0.30] 

1.6 

0.648 

[1.4] 

0.604 [1.2] 

0.722 

[0.67] 

0.762 

[0.44] 




(b) Marginal variance 




Po\o'0 

40 


10 

2.5 


0.625 


0.025 

0.942 [2.0] 

0.946 [1.9] 

0.948 

[1.7] 

0.912 [1.2] 

0.1 

0.920 [2.3] 

0.942 [2.0] 

0.964 

[1.8] 

0.922 [1.2] 

0.4 

0.952 [2.7] 

0.962 [2.4] 

0.968 

[1.9] 

0.928 [1.2] 

1.6 

0.904 [5.3] 

0.936 [4.1] 

0.966 

[2.7] 

0.982 [1.5] 


using a small number of covariates, and used independent Gaussian priors for the extra 
parameters. However, they experienced numerical problems and prior sensitivity, and 


Ingebrigtsen et al. (20151 developed an improved scheme for selecting the hyperparame¬ 


ters of the priors based on the properties of the resulting spatially varying local ranges 
and marginal variances. However, the inherent problem of their specificiation is that k(-) 
affects both the correlation structure and the marginal variances of the spatial field. This 
makes it challenging to set priors on k(-) and r(-), and we aim to improve their proce¬ 
dure by first improving the parametrization of the non-stationarity, and then developing 
a prior using the improved parametrization. 


6.1 Parametrizing the extra flexibility 

Instead of adding spatial variation to the coefficients of the SPDE in Equation ([^, k and 
T, one can vary the geometry of the space in a similar way as the deformation method. 
If E is the Euclidean space M^, the simple SPDE 

(1 - Ae)?x(s) = V^We{s), sgE, (8) 

generates a stationary Matern GRF with range p = \/8) marginal variance = 1, and 
smoothness u = 1. We introduce spatially varying distances in the space by giving the 
space geometric structure according to the metric tensor g(s) = i?(s)“^l 2 , where R{-) is 
a strictly positive scalar function. This locally scales distances by a factor R{s)~^, 




[dsi 


dS2] g(s) 


dsi 

dS2 


R{s) 2(dsi-hds2), 


(9) 
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where du is the line element, and si and S 2 are the two coordinates of E = R^. 

The non-stationarity in the correlation structure is then described through the spa¬ 
tially varying geometry in Equation ([^ , which results in a curved two-dimensional mani¬ 
fold that must be embedded in a space with dimension higher than 2 to exist in Euclidean 
space. The resulting spatial field does not have exactly constant marginal variance be¬ 
cause the curvature of the space is non-constant unless i?(-) does not vary in space, 
but there will be less interaction between R(-) and the marginal variance than between 
k(-) and the marginal variance. And when R(-) varies slowly, the variation in marginal 
variances is small. 

We can relate the Laplace-Beltrami operator in E to the usual Laplacian in 
through 




1 

\/det(5r) 


V]r 2 • (-\/ det{g)g ^Vr 2 ) — R{s)‘^A^ 2 , 


and the Gaussian standard white noise in E to the Gaussian standard white noise in 
through 


We{s) = det( 5 )^/^>VM 2 (s) = R{s) ^>V]r 2 (s). 


Thus the equivalent SPDE in can be written as 


R{s)-‘^ [1 - R{sfA^ 2 \ u{s) = R{s)-^^W^2{s), s e 

where the first factor is needed because the volume element dVe = Y^det(g')dl 4 i 2 . The 
SPDE can be written as 


{R{s) — A-^ 2 )u{s) = \f^R{s) ^>V]r 2 , s G M^, (10) 


in Euclidean space, but we can interpret the non-stationarity through the implied metric 
tensor. The procedure is similar to the simple reparametrization k(-) = but the 

extra factor on the right-hand side of the equation reduces the variability of the marginal 
variances due to changes in k{-). 

For example, the space [0, 9] x [0, 3] with the Euclidean distance metric can be visu¬ 
alized as a rectangle, which exists in M^, or as a half cylinder with radius S/vr and height 
9, which exists in but if the space is given the spatially varying metric tensor defined 
by the local range function 


i?(si, S 2 ) 


1 0 < Si < 3,0 < S 2 < vr, 

< (si — 2) 3 < Si < 6 ,0 < S 2 < vr, 

4 6 < Si < 9,0 < S 2 < vr. 


( 11 ) 


the space cannot be embedded in M^. With this metric tensor, the space is no longer flat, 
and must be embedded in as, for example, the deformed cylinder shown in Figure]^ 
Thus, solving Equation (10) with the spatially varying coefficient is the same as solving 
Equation ([^ on the deformed space. This means that unlike the deformation method, a 
spatially varying i?(-) does not correspond to a deformation of to M^, but rather from 
to a higher-dimensional space. 
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Figure 6: Half cylinder deformed according to the spatially varying metric tensor. The 
lines formed a regular grid on the half cylinder before deformation. 


Since the variation in the marginal variances due to variations in the local ranges 
is small if R{-) does not vary too much, we introduce a separate function S{-) that 
controls the marginal standard deviations of the process and limit the SPDE to a region 
of interest, D, with Neumann boundary conditions. 


-A 


V) = V^Ris) ^>V]r2(s), 


s gV. 


This introduces boundary effects as was discussed in the paper by Lindgren et al. (2011), 
but we will not discuss the effects of the boundary in this paper. 

This SPDE allows for greater separation of the parameters that affect correlation 
structure and the parameters that affect marginal standard deviations than the previous 
approach, and demonstrates the usefulness of careful consideration of how the spatially 
varying behaviour is introduced and parametrized. The SPDE derived based on the 
metric tensor allows for separate priors for correlation structure and marginal standard 
deviations through expansions of log(ii(-)) and log(S'(-)) into bases. 


6.2 Setting priors on the parameters 

There are two sources of non-stationarity in the flexbile extension from stationarity: a 
function R{-) that controls local range and a function S'(-) that controls the marginal 
standard deviation. The degree of flexibility in each of these sources of non-stationarity 
must be controlled to limit the risk of overfitting. Due to the issues of infinite KLDs 
discussed in Section |2.1[ we will not follow the PC prior framework, but instead use a 
construction motivated by the principles of the PC priors to make the non-stationary 
model contract towards a base model of stationarity. Denote by 6 the extra parameters 
added to the GRF that move the model away from the base model of stationarity, 6 = 0. 
The prior on 6 will be constructed conditionally on the parameters of the stationary 
GRF, p and and for each choice of these parameters, 6 should shrink towards 0. 
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We parametrize the local distance, R{-), and the approximate marginal standard 
deviations, S{-), through 


log{R{s)) = log ( ^ 




ni 


log(5'(s)) = log(cr) + 


n 2 

E 

i=l 


s e 2?, 
6 * 2 ,i/ 2 ,i(s), s £V, 


( 12 ) 


where {/i,i} is a set of basis functions for the local range centred such that I)© = 0 , 
for i = l,2 ,...,ni, and {f 2 ,i} is a set of basis functions for the marginal standard 
deviations centred such that (/ 2 ,i, 1)d = 0 for i = 1,2,..., n 2 - We collect the parameters 
in vectors 6 i = ( 0 i_i,...,J and 62 = { 02 , 1 , ■ ■ ■ ,G 2 ,n 2 ) such that 61 controls the 
non-stationarity in the correlation structure and O 2 controls the non-stationarity in the 
marginal standard deviations. 

We want the prior for each source of non-stationarity to be invariant to scaling of the 
covariates and to handle linear dependences between the covariates in a reasonable way. 


and we follow the basic idea of the g-priors (Zellner, 1986) (with g = 1), 
0i|ri ~ A2(0, and 02 |'r 2 ~ A2(0, 


.-ic-i-' 

Oo I 


where S*! is the Gramian, 




{fi,i, /r 


3/V 


( 1 , 1 ) 


V 


and S 2 is the Gramian, 




{f 2 ,i, / 2 ,, 


J/V 


( 1 , 1 ) 


for i,j = 1 , 2 ,... ,ni, 


for i,j = 1 , 2 ,... ,712. 


V 


In this set-up the Gramians account for the structure in the basis functions and the 
strictness of the priors are controlled by two precisions parameters ti and T 2 - If the pre¬ 
cision parameters are fixed hyperparameters, the resulting priors are Gaussian. However, 
the Gaussian probability density is flat at zero due to the infinite differentiability of the 
density function, and we prefer a prior that has a spike at zero. 

This can be achieved by selecting the hyperpriors to be the PG prior for the precision 


parameter in a Gaussian distribution (Simpson et al. 2015), which is designed to shrink 


towards a base model of no effect. We combine the selection for the hyperpriors with an a 
priori ansatz that the independence between the correlation structure and the marginal 
variance in the prior for the stationary model also can be applied to the non-stationarity. 


/ s Al -3/2 
7r(ri) = exp 


(-Ah and 7r(T2) = ^^%xp (-A2r2 . 


These hyperpriors for the precision parameters have so heavy tails that integrating them 
out will introduce infinite spikes in the marginal priors for 0 i and O 2 at zero. 
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The hyperparameters Ai and A 2 control the spread of the priors and can be selected 
either based on expert knowledge or on frequentist properties. The parameters 61 and 62 
give multiplicative effects to local range and marginal standard deviations, respectively, 
and one possibility is to control the size of the multiplicative effect through 


Prob ( max 
\sex> 

Prob 1 max 

\s&V 




log 


5(s: 




>Ci 


>C2 


p,a 


p,a 


= Prob 


max 

\s^V 

log 1 

max 

\s^V 

log I 


' R{s) \ 

.p/V^J 


‘S{s] 


> Cl ) — ai, 




> C 2 I = a2- 


One can see from Equation (12) that the relative differences do not depend on the parame¬ 
ters of the stationary model, and the full prior factors as tt{p, cj^, 9) = 7r(/9)7r((T^)7r(0i)7r(02)- 
In practice, it is difficult to have an informed, a priori opinion on the non-stationary 
part of the model, but the hyperparameters Ai and A 2 can be chosen in such a way 
that they give a conservative prior. Since stationarity is our base model and the non- 
stationarity is provided as extra flexibility, we will require that the hyperparameters 
are set such that the inference behaves well when the true data-generating distribu¬ 
tion is stationary. We propose to set the hyperparameter by first fitting the stationary 
model, using the maximum aposteriori estimate of the parameters to make multiple sim¬ 
ulated datasets from the stationary GRF and nugget effect, fit the non-stationary GRF 
with a nugget effect to each dataset, and calculate the frequentist coverage of the non- 
stationarity parameters. The hyperparameters can then be set such that the coverage 
of the credible intervals of the non-stationary parameters is close to nominal coverage. 
This ensures that the prior provides enough regularization that each posterior marginal 
for the non-stationarity parameters do not suggest non-stationarity when a stationary 
data-generating function is used. 


6.3 Example: Annual precipitation in southern Norway 


We use a dataset consisting of total annual precipitation for the one year period Septem¬ 
ber 1, 2008, to August 31, 2009, for the 233 measurement stations in southern Norway 
shown in Figure]^ The dataset was used by Ingebrigtsen et al. (2014 2015) to study 
the use of covariates in the covariance structure and associated priors. They used an 
intercept and a linear effect of the elevations of the stations in the first-order structure 
and used the elevation as a covariate in the second-order structure. We will follow their 
choice of covariates in the first-order structure, but add the magnitude of the gradient 
of the elevation in addition to the elevation as a covariate in the second-order structure. 
The additional covariate is added to demonstrate that the proposed prior deals appro¬ 
priately with multiple covariates and to see the effect of the additional covariate on the 
fit of the model. 

We use a simple geostatistical model 


yi = l3o + XiPi -h u{si) + ei, 


i = 1,2,..., 233, 


(13) 
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Figure 7: Total precipitation for the one year period September 1, 2008, to August 31, 
2009, for 233 measurement stations in southern Norway measured in meters. Coordinate 
system is UTM33. 
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Figure 8: Mesh used for the SPDE approach. 


where for station i, t/i is the observation made at location Sj, Xi is the elevation of the 
station, (/3o,/3i) are the coefficients of the fixed effects, «(•) is the spatial effect, and 
Cj is the nugget effect. The nuggets are i.i.d. ~ AA(0 ,ct^), and the spatial effect is 
constructed with the SPDE approach ( ]Lindgren et al. [2011 ) and the stationary version 
has two parameters: spatial range p and marginal variance of the spatial field 

In the SPDE approach the spatial field is defined on a triangular mesh and the values 
of the spatial field within the triangles are defined through linear interpolation based 
on the values on the nodes of the mesh. This means that the elevation covariate in the 
first-order structure is only needed at each observation location, but that the elevation 
and gradient covariates in the second-order structure are needed at every location within 
the triangulation. We use the mesh shown in Figure]^ and project elevation and gradient 
values from the high resolution digital elevation map GLOBE ( [Hastings et al. , [l999 ) onto 
the mesh. The projection is piece-wice linear on each triangle of the mesh and minimizes 
the integrated square deviation over the domain covered by the mesh. This results in the 
piece-wise linear covariates shown in Figure]^ 

The stationary model is given priors that satisfy 


P(p < 10) = 0.05, P(o-s > 3) = 0.05 and P(cJn > 3) = 0.05, 


and fitted to the data with INLA ( ]Rue et al. 20091. With this prior we consider a 
standard deviation greater than 3 large for both the process and the nugget effect, and a 
range less than 10 km unlikely based on the spatial scale that we are working on. We are 
not interested in the stationary model beyond using the MAP estimates, (Tn = 0.13, p = 
219 and a = 0.72, for setting the hyperparameters in the prior for the non-stationarity. 
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Figure 9: The covariates (a) 


elevation and 


(b) magnitude of the gradient used for the 


covariance structure. 


and to compare the predictions scores with that of the non-stationary model. 

The coefficients, Oi, of the two linear covariates in log(i?(-)) are given the prior 

6i\ti ~ V ( 0 , 

-3/2 

Tl ~ —e 1/V 1 

as described in the previous section, and the coefficients, 82 , of the two linear covariates 
in log(S'(-)) are given a similar prior, but with hyperparameter A 2 . The non-stationary 
model is more difficult to fit in the INLA framework than the stationary model because 
the priors for 0i and 62 have infinite spikes in 0 that makes the posteriors non-Gaussian 
in the area around the origin. The optimization can be improved by reparametrizing as 
0 'i = 81 and 82 = 02 \/^ 2 ! marginal posteriors will not be sufficiently peaked 

at the origin and will miss the multimodality that should be present when there is a mode 
close to zero. However, we still use INLA as a fast approximation for the repeated fitting 
of the datasets needed for selecting the hyperparameters of the prior for non-stationarity. 

Specificially, we use the MAP estimates of the stationary model to simulate 100 
datasets from the stationary model with /?o = /?i = 0, set values for the hyperparameters 
Ai and A 2 , fit a non-stationary model with /3o = /3i = 0 to each of datasets, and cal¬ 
culate the frequentist coverage of the the 95% credible intervals of the non-stationarity 
parameters. We tried several values for the hyperparameters Ai and A 2 and found that 
= A 2 = 20 provides coverage that is close to the nominal 95%. These values are used 


for the hyper parameters when fitting the data to the full model in Equation (13) with a 


non-stationary spatial effect. The lack of some mass at zero for the posteriors calculated 
by INLA is likely to make the procedure overly concervative. 

The non-stationary model was then fitted using an MCMC sampler and the resulting 
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Figure 10: Posterior mean of (a) range and (b) 


standard deviation. 


posterior means of the range and the standard deviation are shown in Figure 10 From 
Figure 11 one can see that the spatially varying range and standard deviation leads to 
non-stationarity in the correlation structure and the marginal standard deviations of the 
spatial effect. However, the effect in standard deviations appear to be stronger than 
the effect of the spatially varying range. The posteriors for the multiplicative effects 


on the stationary range and standard deviation for the western location in Figure 11a 


shown in Figure 12 shows that the effects are signihcant in that location. The posterior 
probabilities for the effects to be less than 1 and greater than 1 are 99% and 99%, 
respectively. This shows that the the more flexible non-stationary model is preferring to 
move away from the stationary model even under a conservatively selected prior. 

A simple estimator of the the leave-one-out log-score can be made directly from the 
samples of the MCMC sampler, and we hnd the score 0.13 for the stationary model and 
0.22 for the non-stationary model. The leave-one-out CRPS can be calculated based on 
a Gaussian approximation of the posterior by matching the mean and variance like done 
by Ingebrigtsen et al. (2014 20151. This results in a CRPS of 0.111 for the stationary 
model and a CRPS of 0.101 for the non-stationary model. These values are lower than 
0.1267 and 0.1241, respectively, that was found in Ingebrigtsen et al. (20141, but their 
values were based on 13-fold cross validation. The CRPS in the later paper Ingebrigtsen| 


et al. (20151 are not directly comparable as they are based on hts of multiple years 


of data. The non-stationary GRF leads to better scores, and experimenting with the 
strictness of the prior showed that further improvements were possible by making the 
prior weaker, but that making the prior too weak leads to worse scores. The prior 
and the procedure for selecting the hyperparameters appears to introduce a reasonable 
level of conservativeness for this dataset. The estimates for CRPS without using the 
approximation for the predictive distributions are 0.092 for the stationary model and 
0.083 for the non-stationary model, which means that a Gaussian approximation to the 
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Figure 11: Covariance structure described through (a) 0.90, 0.57, 0.36 and 0.22 level 
curves of correlation with respect to the two locations marked with red crosses and fl 
marginal standard deviations. 




(a) Multiplicative effect on range (b) Multiplicative effect on standard devia¬ 
tion 


Figure 12: Posteriors of the multiplicative effect on the stationary (a) range and (b) 
standard deviation at the western location in Figure 


11 a 
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predictive distribution is sub-optimal. 

Figure pT] gives the impression that the non-stationarity in the correlation structure 
is weak and that the non-stationarity in the marginal standard deviations is clearer. 
If we run the model with the same hyperparameters and remove the non-stationarity 
in the local range, the CRPS is 0.086, and if we remove the non-stationarity in the 
marginal standard devations, the CRPS is 0.081. This shows that the covariates in the 
local range are contributing more to the improved predictions than the covariates in 
the standard deviation, and that using all four covariates has degraded the performance 
slightly compared to using only a non-stationary local range. This demonstrates how 
difficult it is to ensure that adding covariates in the second-order structure always leads to 
better predictions. A conservative prior helps avoiding severe overfitting, but if prediction 
is the main goal, the prediction scores of the models should be compared. 

7 Discussion 

The guiding principle for constructing priors in this paper has been to limit flexibility. 
The stationary GRF is an extension of no spatial field and the prior should shrink towards 
no effect, and the non-stationary GRF is an extension of the stationary GRF and the prior 
should shrink towards stationarity. This idea of a base model and a flexible extension of 
that base model is one of the central points of the PC prior framework, and the prior 
on the range and the marginal variance was constructed directly within this framework, 
whereas the prior on the non-stationary extension was guided by the framework. 

The main issue in each case is that there is not enough information about the param¬ 
eters of the GRF under in-fill asymptotics. Even for the two-parameter stationary model 
the posteriors do not contract, and for more complex non-stationary GRFs one risks se¬ 
vere overfitting. The constructed priors provides the opportunity to limit the flexibility 
by introducing prior knowledge about the process and limiting the risk of overfitting. 
Additionally, the joint PC prior for the range and the marginal variance for the Matern 
GRF is easy to implement and quick to compute, works with any observation process and 
can be used in hierarchical models, and does not depend of the sampling design of the 
process. Further, the stationary prior can be extended to a prior for a full non-stationary 
GRF that has covariates both in the correlation structure and in the marginal standard 
deviations. 

Setting the hyperparameters for the stationary part of the model can be done based 
on statements about what constitutes a large standard deviation and what constitutes 
a small range. This allows the users to choose to limit the movement towards intrinsic 
models along the ridge in the likelihood and thus provide more sensible posterior inference 
for the problem at hand. We observe sensitivity when one misses by one order in the 
prior specification or set too high lower limits or too low upper limits, but in return the 
credible intervals are shorter than for the objective priors. 

When setting the hyperparameters for the non-stationary GRF, it is difficult to elicit 
expert knowledge since the second-order structure is not observed directly, and we discuss 
an alternative way to set the hyperparameters based on the frequentist coverage of the 
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credible intervals. Using the new prior and the associated scheme for selecting the hy¬ 
perparameters, we find a better fit for the non-stationary GRF than with the stationary 
GRF applied to the dataset of annual precipitation in southern Norway measured both 
in leave-one-out GRPS and log scores. 

The paper shows that the PG priors provide a helpful tool for setting priors on 
parameters for which it up until now has been difficult to come up with theoretically 
founded priors. But also that the ideas of the framework are helpful for constructing 
priors that limit flexibility even when the exact derivation is not possible. 
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A Derivation of the Kullback-Leibler divergence for k 

We parametrize the Mat&n GRF through k = jp and 


r = 


Tiu) 


(47r)i^/2r(j/ + dj2)a‘^K‘^‘' ’ 

where p is the range, ci^ is the marginal variance, v is the smoothness, and d is the 
dimension of the base space. If r is fixed and the process is observed on a bounded 
observation window, the Kullback-Leibler Divergence (KLD) between the distributions 
described by k = kq > 0 and n = Ki > 0 is finite and it is possible to use the KLD 
to quantify how different the distributions are. The KLD of the probability measure Q 
from the probability measure P is defined by 


^kl(RIIQ) = 




(14) 


where AP/dQ is the Radon-Nikodym derivative of P with respect to Q, and expresses 
the information lost when Q is used to approximate P. 


Fix r and i/, and let denote the Matern GRF with parameter k. Lindgren et al. 


(|2011 ) showed that this GRF is a solution of the stochastic partial differential equation 
(SPDE) 

(k 2 - = W(s), (15) 

where a = u + d/2 and W is standard Gaussian white noise, and that the spectral density 
of Ui^ is given by 


A('^) = ^ 


27r J 


(16) 


We approximate the GRF with the solution Uk of SPDE (15) restricted to the 
domain V = [—L/2,L/2]'^ with periodic boundary conditions. The spectral density of 
Uk is discrete and the spatial field can be written as 

u^(s)= 

where {z^} are independent Gaussian random variables with variances given by 

1 Var[(>V,e*<2’^''/'^.«>)25] 


Afc(K) = 


t(k 2 + ||27rfc/L||2)« (^Qi{2nk/L,s) ^Qi{2nk/L,s)'^2^ 

1 1 


t{k? + I |27rfc/L| p)“ 

The KLD between 11^ and 11^0 (based on Bogachev (|1998 Thm. 6.4.6)), 


(17) 


2KLD(k||ko) = 

kezd 

= E 


A^J 

{4 + \\2nk/L\\^r 


{k? + ||27rfc/L|| 


2\a 


- 1 - log 


(Kg + ||27rfc/L|p)» 
{k? + ||27rfc/L|P)“ 


(18) 
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is a simple expression involving only the spectral densities of the processes. The sum can 
be turned into a Riemann sum by scaling it with the step-size to the power of d, 


^ ) KLD(/c||ko) 


E 

fceZ'^ 


I) 

f^jw) 


{kI -F ||27rfc/L|| 

L(K2 + ||27rfc/L||2 

fK{w) ' 


2\a 


- 1 - log 


(K2 + ||2vrfc/L||2)« 


- 1 - log 


/ko(^) 


(k 2 + ||27rfc/L||2) 

dm -I- E{L, Ko), 


where E{L, kq) is the error in the Riemann sum. 

We are interested in the case ko = 0, and the Riemann sum will only converge if 
the limits kq 0 and L —>• oo are taken in such a way that the zero-frequency term 
converges. The zero-frequency term 


27r 

T 


Kr 




— 1 — a log 


converges to zero if L tends to infinity in such a way that L = o(kq ^), and we apply this 
relationship between L and kq and introduce the scaled KLD 


27r\' 


1 


KLD(k|| 0) = lim — RLDIkIIko) = - 
Ko->o \ L / 


(w'^w) 


(k2 -|- w'^wY 


- 1 - log 




(^2 -g 


dm. 


We perform the change variables w = ny and find that 


KLD(k||0) = ^ 




Yi + y'^yY 


- 1 - log 


{y^yf 


(1 -F 


E^dy = k‘^KLD(1||0) oc 


(19) 


if KLD(1||0) exists. 

KLD(1||0) can be expressed an an integral in n-dimensional spherical coordinates, 


KLD(1||0) = Cd 


/o 


r 

1 -g r2 


- 1 - log 


r 

1 -g r2 


„d-l 


dr, 


( 20 ) 


where Cd is a constant that varies with dimension. There are two issues: the behaviour 
for small r and the behaviour for large r. For d = 1, 

roa ^2 

KLD(1||0) < —Cio: / log-^dr = vraCi, 

Jo 1 + ^2 

and we can conclude that the behaviour around 0 is not a problem for any d > 1. The 
behaviour for large r can be studied through an expansion in (1 -g r2)“^. The part 
between the square brackets in Equation (20) behaves as 

.2 


a 


1 


2 (l-gr2)2 


+ 0 


1 


(1 -g r2)3 
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This means that we can find an 0 < rg < oo such that 



- 1 - log 
< Const + 



j.d 


(1 + r 2)2 


+ 


C 

(l+r2)3 


dr, 


where C > 0 is a constant. For d < 3 both terms on the right hand side are finite and 
based on this and the boundedness for d = 1, we can conclude that KLD(1||0) is finite 
for d < 3. 


B Tables for frequentist coverage study 

The simulation study in Section]^ was run with four different priors: the PC prior (Pri- 
orPC), the Jeffreys’ rule prior (PriorJe), a uniform prior on range on a bounded interval 
combined with the Jeffreys’ prior for variance (PriorUnl) and a uniform prior on the log- 
range on a bounded interval combined with the Jeffreys’ prior for variance (PriorUn2). 
For each prior a selection of hyperparameters were tested on datasets generated from 
true ranges po = 0.1 and po = 1-0, and the frequentist coverages of the 95% credible 
intervals and the lengths of the credible intervals were estimated. For po = 0.1, PriorJe 
gave 97.0% coverage with average length 0.86 for range and 96.0% coverage with average 
length 2.7 for marginal variance, and for po = 1.0, PriorJe gave 95.4% coverage with 
average length 445 for range and 94.4% coverage with average length of 355 for marginal 
variance. The results for PriorPC is given in Section and the the results for the two 
other priors are collected in the tables: 


Prior 

po = 0.1 

O 

II 

lo 

PriorUnl 

Table 


Table 

r 

PriorUn2 

Table i 


Table 

i 
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Table 5: Frequentist coverage of 95% credible intervals for range and marginal variance 
when the true range /9o = 0.1 using PriorUnl, where the average lengths of the credible 
intervals are shown in brackets. 


(a) Range 


A\B 

2 

20 

200 

5 • 10-2 

5 • 10-3 

5 • 10-4 

0.901 [0.95] 
0.935 [0.92] 
0.948 [0.93] 

0.901 [8.6] 
0.918 [7.7] 
0.929 [7.9] 

0.847 [122] 
0.887 [110] 
0.893 [110] 


(b) Marginal variance 


A\B 

2 

20 

200 

5 • 10-2 

5 • 10-3 

5 • 10-4 

0.952 [3.5] 
0.945 [3.3] 
0.953 [3.3] 

0.941 [29] 
0.937 [27] 
0.925 [27] 

0.895 [460] 
0.907 [410] 
0.921 [412] 


Table 6: Frequentist coverage of 95% credible intervals for range and marginal variance 
when the true range po = 0.1 using PriorUn2, where the average lengths of the credible 
intervals are shown in brackets. 


(a) Range 


A\B 

2 

20 

200 

5 • 10-2 

5 • 10-3 

5 • 10-4 

0.986 [0.47] 
0.976 [0.44] 
0.932 [0.40] 

0.979 [0.84] 
0.950 [0.81] 
0.945 [0.70] 

0.988 [1.1] 
0.966 [1.0] 
0.944 [1.3] 


(b) Marginal variance 


A\B 

2 

20 

200 

5 • 10-2 
5 • 10-3 
5 • 10-4 

0.949 [2.0] 
0.968 [1.8] 
0.948 [1.7] 

0.962 [2.9] 
0.960 [2.6] 
0.960 [2.4] 

0.965 [3.6] 
0.959 [3.2] 
0.949 [3.7] 
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Table 7: Frequentist coverage of 95% credible intervals for range and marginal variance 
when the true range po = 1 using PriorUnl, where the average lengths of the credible 
intervals are shown in brackets. 


(a) Range 


A\B 

2 

20 

200 

5•10-2 

5•10-3 

5•10-4 

0.995 [1.5] 
0.996 [1.5] 
0.994 [1.5] 

0.831 [18] 
0.818 [18] 
0.844 [18] 

0.593 [188] 
0.539 [188] 
0.537 [188] 


(b) Marginal variance 


A\B 

2 

20 

200 

5 • 10-2 

5 • 10-3 

5 • 10-4 

0.979 [2.0] 
0.979 [2.0] 
0.969 [2.0] 

0.857 [20] 
0.821 [20] 
0.828 [20] 

0.614 [208] 
0.585 [205] 
0.561 [206] 


Table 8: Frequentist coverage of 95% credible intervals for range and marginal variance 
when the true range pQ = 1 using PriorUn2, where the average lengths of the credible 
intervals are shown in brackets. 


(a) Range 


A\B 

2 

20 

200 

5•10-2 

5•10-3 

5•10-4 

0.980 [1.5] 
0.974 [1.5] 
0.964 [1.5] 

0.959 [12] 
0.954 [12] 
0.953 [13] 

0.933 [69] 
0.954 [67] 
0.956 [68] 


(b) Marginal variance 


A\B 

2 

20 

200 

5 • 10-2 

5 • 10-3 

5 • 10-4 

0.955 [1.8] 
0.962 [1.8] 
0.939 [1.8] 

0.952 [12] 
0.943 [12] 
0.946 [12] 

0.945 [61] 
0.941 [60] 
0.953 [60] 
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